Journal of Soft Computing in Civil Engineering 7-4 (2023) 132-143 


ouyan 
Press 


Journal of Soft Computing in Civil Engineering 


Journal homepage: www.jsoftcivil.com 


Contents lists available at SCCE 


Spatial Data-Driven Traffic Flow Prediction Using Geographical 


Information System 


Mehdi Babaei', Saeed Behzadi?’ © 


1. M.Sc. Student in Geographic Information Systems, Department of Civil Engineering, Shahid Rajaee Teacher 


Training University, Tehran, Iran 


2. Assistant Professor in Surveying Engineering, Department of Civil Engineering, Shahid Rajaee Teacher Training 


University, Tehran, Iran 


Corresponding author: behzadi.saeed@ gmail.com 


E) _ https://doi.org/10.22115/SCCE.2023.346 188.1460 


ARTICLE INFO 


ABSTRACT 


Article history: 

Received: 08 June 2022 
Revised: 27 April 2023 
Accepted: 19 May 2023 


Keywords: 

Geographic information system; 
Traffic; 

Prediction; 

Traffic map. 


Today, traffic is one of the biggest problems of urban 
management. There are two general methods for traffic 


management, soft and hard methods. In the hard method, 
physical changes are applied to the road network, and in the 
soft method, the existing conditions are optimized. Traffic 
forecasting is one of the soft methods for traffic 
management. Traffic forecasting is usually done based on the 
time of existing traffic conditions, while the effect of 
location and neighborhood, which is one of the concepts of 
GIS science, is less seen in predictions. In this research, 
variables affecting traffic were first identified. Then, five 
machine learning methods were used to predict traffic on all 
city roads. KNN method was selected as the best one with 
accuracy and Kappa of 96.14% and 0.95 respectively. 
Finally, the prediction map was prepared by applying the 
superior model and Geographic Information System (GIS). 
One of the advantages of the traffic prediction map is easy 
for users and administrators to manage traffic. 
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1. Introduction 


Traffic is one of the most important issues we face in our daily lives. Today, traffic has been 
associated with complexities in big cities, especially in metropolitan areas. Due to the many 
facilities, and cultural conditions of cities, we have witnessed an increase in migration from rural 
to urban areas. This has led to the growth of the urban population and urbanization. On the other 
hand, the increasing population and the lack of proper passages in cities have increased the 
number of vehicles and created a traffic problem in cities. 


The increase in traffic causes other problems such as air pollution, noise pollution, environment 
destruction, care accidents, high fuel consumption and long travel time [1]. Accordingly, it is 
important to address the issue of traffic and its solution. Therefore, the authorities are trying to 
reduce traffic with various methods. 


In intelligent transportation systems, traffic prediction is a basic component of many control and 
monitoring systems. Traffic prediction is a necessary step to achieve time optimization in the 
urban traffic control system. Hence, traffic prediction is an important research area for intelligent 
traffic control and traffic guidance. Therefore, using effective methods to predict traffic flow 
with high accuracy is very important. 


Urban transportation planning officials are constantly trying to predict the future state of traffic 
on a network to take precautionary measures. If there was no prediction, we would see high 
traffic congestion and low service. 


Traffic flow forecasting has a fundamental role in various issues such as traffic light 
management, route planning, and so on. For example, knowing the flow of traffic gives drivers 
the ability to make better decisions. Drivers can decide instantly according to the traffic situation 
and choose alternative routes to reach the destination. 


In the past years, various types of research have been conducted on traffic forecasting, vehicle 
speed, and other traffic parameters. However, less success has been achieved in reaching the 
goals with higher precisions [2]. In this paper, we propose a solution to predict vehicular traffic 
using Machine Learning (ML) techniques. In this paper, variables that affect traffic are 
identified. Here five methods Decision Tree (DT), K-Nearest Neighbor (KNN), Discriminant 
Analysis, Naive Bayes, and Neural Network (NN) are used to learn the effect of the parameters 
on the traffic situation. Then, the most accurate method is selected based on the knowledge 
extracted from the learning. Finally, the future traffic of the city can be predicted by entering the 
new data into the model. A traffic map of the city's future can also be prepared by using GIS. The 
main purpose of this study is to increase the accuracy of traffic forecasting and also display 
forecast traffic maps. This study can help travelers and traffic officials in the decision-making 
process whether for travel or management purposes. 


So far, various studies have been done on this subject, for example, Smith and Demetsky 
developed four historical means, neural network, time series, and nonparametric regression 
models to test freeway traffic prediction. The results showed that the nonparametric model 
performs better [3]. Zuo et al. created a new method for developing the weights of the k-NN 
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model [4]. Akbari et al. suggested a cluster k-NN model to prevent discontinuous data 
interference in the database [5]. In 2015, Meng et al. proposed a two-step method based on the 
balanced binary tree and Advanced K-Nearest Neighbor (AKNN) techniques for predicting 
short-term traffic flow. The result showed that the k-NN model has a good result in predicting 
short-term traffic [6]. Xingyin Duan et al. studied the five urban areas using 1.4 billion GPS taxi 
records and gray correlation analysis to find factors that affect traffic congestion. They finally 
predicted traffic congestion through the BP neural network [7]. Su Su Hlaing et al. proposed RF 
theory and NN methods to simulate complex and nonlinear processes to predict traffic flow and 
road traffic accident based on historical traffics data [8]. Aditya Rao et al. proposed a dynamic 
traffic system that calculated the percentage of congestion and allocated the timer to each signal 
[9]. The proposed system used image processing techniques to process the video. Pavan 
Chhatpar et al. conducted a traffic forecast study in 2020. They provided predictive analysis of 
traffic in a given area using supervised learning techniques such as Back Propagation Neural 
Network (BPNN) [10]. They also designed an Android application to predict the traffic densities 
of the entire map areas in an offline mode using real-time traffic data. Zhang Tao et al used the 
non-parametric regression method of K-nearest neighbors to predict short-term traffic flow. The 
result showed that the non-parametric regression method has high precision[11]. In 2002, Tan 
Guozhen et al. constructed an intelligent neural network model based on the neural network 
using linear independent and sigmoid functions with tunable parameters. They found that the 
prediction convergence speed and accuracy are greatly improved compared with the traditional 
BPN [12]. Thammasak Thianniwet et al. used a decision tree learning algorithm to categorize 
GPS data. In this study, the level of traffic congestion was also determined by recording road 
traffic images [13]. Cynthia Jayapal et al. introduced a way to predict traffic for mobile users. 
Mobile phones are equipped with traffic applications that use GPS to identify locations. This data 
is sent to a remote server that predicts traffic congestion. The traffic is then transferred to the end 
user's phone [14]. In 2020, Gaurav Meena et al. used machine learning algorithms, Decision 
Trees (DT), Support Vector Machine (SVM), and Random Forest (RF) to predict traffic, while 
the RF algorithm performed better than the others [15]. Sharmila and Velaga (2020) used 
machine learning techniques such as Artificial Neural Networks (ANN) and SVM to estimate 
corridor-level travel time. They considered various factors such as road geometry, traffic 
variables, location information from the GPS receiver, and other spatiotemporal parameters that 
affect travel time. A k-fold cross-validation technique was then used to determine the optimum 
model parameters in the ANN and SVM models [16]. Srikanth and Mehr used a VISSIM 
microscopic traffic simulation model to generate traffic flow data. They compared the Adaptive 
Neural-Fuzzy Inference System (ANFIS), ANN, and Multiple Linear Regression (MLR) models 
to predict the number of passenger cars. The results showed that the ANFIS model estimates are 
closer to the real data compared to MLR and ANN models [17]. In 2022, Ghasempoor and 
Behzadi predicted the traffic in the coming days using the neural network algorithm based on the 
collected traffic data. Traffic forecasting was done using basic neural network methods, Feed- 
forward Levenberg-Marquardt, Conjugate Gradient Neural Networks, and Bayesian Neural 
Networks. The results showed that the Feed-forward Levenberg-Marquardt method predicts 
traffic data with 81.59% accuracy, which had the most accuracy among other methods [18]. 
Andrew Moses and Parvathi R proposed a solution to predict vehicular traffic using machine 
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learning techniques. In their study, vector regression support, linear regression algorithms, 
decision tree, and random forests were used, and finally, random forests had a better performance 
than other methods [19]. 


Given the serious problem of traffic in most cities and its harmful effects, it is necessary to find 
solutions to solve this problem. Tehran is one of the densely populated cities where the traffic 
problem is the main problem of urban management. The high population and lack of proper 
transportation have caused traffic congestion in this city. In this research, all the main roads of 
the city are considered as the study area. Presenting the future traffic situation in the form of a 
map is very efficient for users, which is done in this research using GIS. In the present study, the 
introduction and background of the research are stated first. The second part introduces the 
methods and theories and examines the effectiveness of the proposed model. In the end, the 
result of the research and future research opportunities are presented. 


2. Methodology 


There are different methods to predict the traffic flow of a city. Most of which are very expensive 
and time-consuming. In this research, traffic forecasting was done using computational 
intelligence algorithms. At first, the required data were identified and collected. Then the 
necessary pre-processing was done on the data. Next, the most suitable algorithm was selected 
by examining different ML algorithms. Figure 1 shows the flowchart of the proposed model. 


Data collection 


Traffic situation 


Coordinate X 


Coordinate Y The use of ML 
Data models 
preprocess 


Days of the Week 
Time 


Holidays Status 


Weather Condition 


Production of the 
future traffic map of 
the city 


Choosing the best 


prediction model Compare models 


Fig. 1. The flowchart of the method. 


There are various variables that affect traffic congestion. This data is provided by a Web GIS 
system designed by [20]. The designed website collects the traffic data of the entire study area 
once every 15 minutes. The collected data is from 4 to 10 April 2020. The study area is Tehran 
city, which is located from 51° 07’ 10” E to 51° 30’ 14” E longitude, and 35° 36’ 01” N to 35° 48’ 
27” N latitude at an altitude of 1495.9 m Mean Sea level (MSL). Traffic data is collected in point 
format. Each point has a longitude and latitude, both of which are designated as input variables 
[21,22]. Various variables affect traffic, which has different effects on traffic. Variables such as 
time, day, holiday or non-holiday, and weather are among the most important variables [23]. 
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The traffic pattern is diverse on different days. For example, the day of the week plays a 
significant role in the amount of traffic. So it is determined as one of the variables (the numbers 0 
as Saturday and 6 as Friday). Time is another variable that affects traffic. The time of collected 
data is 24 hours a day (0:00 to 23:59), which is shown as a number. Holidays are also another 
variable that affects traffic. On holidays, most people usually rest, which causes a sharp decrease 
in traffic. The holiday is shown as a binary variable. If the desired day is a holiday, the number is 
one; if not, the number is zero. Another important variable that affects traffic is the weather. 
Rainy weather, thunderstorms, snow, etc. increase traffic congestion. Tehran weather data is 
stored 24 hours a day. In this article, weather conditions are divided into three categories: 1, 2, 
and 3. Storms, thunder, and snow increase traffic. These weather conditions are shown by the 
number 3 in these situations. Light rain, rain, drizzle, rain shower, and fog are shown with 
number 2, and other weather conditions that do not affect traffic intensity are shown with number 
1. Table 1 shows the different weather conditions with their categories. 


Table 1 
Categories of different weather conditions. 
Category Condition Category Condition 
2 Showers in the Vicinity 1 Partly Cloudy 
2 Light Rain with Thunder 1 Fair 
3 T-Storm 1 Fair/Windy 
3 Thunder 1 Mostly/Cloudy 
1 Fog 1 Cloudy 
1 Wintry Mix 2 Light Rain 
2 Drizzle 2 Light Rain Shower / Windy 
2 Light Drizzle 2 Rain Shower / Windy 
3 Thunder in the Vicinity 2 Rain 
3 Snow 2 Rain Shower 


In this system, a class is assigned to each point, which indicates the traffic situation at that point. 
These collected traffic data are divided into five classes: no traffic (1), low traffic (2), medium 
traffic (3), high traffic (4), and very high traffic (5). Two million record samples are collected 
during one week. Table 2 shows a view of the collected data. 


Table 2 
Sample traffic data. 
NO Traffic Coordinate Coordinate Days of the Time Holidays Weather 
situation X Y Week Status Condition 
1 1 531637.216 | 3951732.410 0 6.15 0 1 
2 3 538180.386 | 3944881.363 0 5.23 0 2 
3 5 531523.288 | 3947692.771 2 16.40 0 1 
4 2 537917.365 | 3950958.017 3 16.37 0 1 
5 4 544537.347 | 3950760.197 4 14.37 0 1 
6 1 532012.245 | 3951173.183 5 12.10 0 1 
7 5 532770.493 | 3946938.622 5 19.4 0 1 
8 4 542575.722 | 3949435.199 6 12.65 1 3 
9 2 531617.096 | 3946225.768 6 21.70 1 1 
10 3 542158.483 | 3953776.624 6 22.18 1 1 
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The possibility of incorrect or inconsistent data in Big Data is very high. These outlier data 
disrupt data mining analysis. In this stage, data is prepared for the data mining process. 
Therefore, the quality of the output results will be increased. Figure 2 shows the total data 
collected in five classes after data preprocessing. This figure shows traffic data in all the streets 
of the city. 


«10° Original data set: Tehran Traffic Dataset 
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Fig. 2. Tehran City traffic dataset. 


Machine learning is extensively used in many fields. The high amount of data empowers 
machine learning methods [19]. The goal of machine learning is to allow the computer to learn 
automatically without human intervention and to be able to adjust its actions accordingly [24— 
26]. The main purpose of machine learning algorithms is to generalize learning beyond training 
examples. 


In the present study, five methods are used to predict traffic. The decision tree is a model that 
provides a tree-like structure for deciding and classifying particular data. The structure of the 
decision tree contains the root, the topmost node, branches which are the internal nodes, and the 
leaf node. The internal nodes represent a question and the branch that connects the node denotes 
the solution and the leaf node tries to predict the solution [27]. 


The KNN method is another one for classification. The KNN is a common nonparametric 
regression method, which is one of the simplest machine-learning algorithms. KNN is a 
technique that is used in data mining, machine learning, and pattern recognition. Since KNN is 
easy to use and implement, it is considered to be one of the best ten algorithms in the field of 
data mining [28]. This method is commonly used in forecast analysis to classify a point based on 
its neighbors [29]. In these two methods, the k-fold cross-validation method with K of 25 is used 
for validation. In this method, the data is divided into k sections, each of which is randomly 
placed in one of these k sections. Then the training and test are repeated k times; in each 
repetition, the k-1 part is considered as train data and one as test data. In the KNN prediction, it 
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is important to choose the number of neighbors appropriately. The number of neighbors (K) is 
usually determined by using an empirical procedure. In this experiment, K is taken as 1 to 20, 
and the best accuracy is obtained at k = 1. 


The next method is discriminant analysis, which is a multivariate statistical analysis to separate 
two or more groups of observations based on k variables measured on each sample and find the 
contribution of each variable in separating the groups [30]. 


Naive Bayes classifier is another method, which is based on the Bayes theorem. A hypothesis is 
generated for the given set of classes. In the Naive Bayes algorithm, the assumption of 
independence is raised. Bayesian theory can be used to predict the future based on the current 
events according to the theory of statistics and probability [27,31]. 


The last method is Artificial Neural Networks (ANNs) to determine the pattern of urban road 
traffic. These networks are modeled based on biological neural networks. ANNs can be taught to 
find patterns and classification information by imitating the human brain simulation [28]. ANN is 
used in various fields such as simulation, pattern recognition, learning, etc. [32]. In ANN, there 
are artificial neurons and synapses form the nodes and edges of the graph network, respectively. 
ANN is divided into two categories: feed-forward networks (one-way directional graphs) and 
feedback networks (bidirectional graphs). Feed-forward neural networks represent non-linear 
functional mappings between a set of input and output variables [33]. Here, a two-layer feed- 
forward network with hidden sigmoid neurons and smooth maximum output is used. 70% of 
input data are selected as the training samples. The model arranges them into a basic model for 
the training procedure. 15% of the data are then selected for testing, and 15% of them are 
selected for verification [32]. 


The network is trained with scaled conjugate gradient back-propagation. In the hidden layer, 
there are three layers of 30, 45, and 55 neurons, respectively. The number of layers and neurons 
is selected according to computer performance and time constraints. If the number of layers and 
neurons increases, the processing time is longer, while the less number of layers and neurons 
decreases the accuracy. 


Accuracy and kappa criteria are used to compare and evaluate the models. These criteria are 
calculated using the confusion matrix. Accuracy is the overall accuracy of classification, which 
represents the ratio of correctly classified pixels to the sum of all observed pixels (Equation 1) 
[15,23,34]. 


The sum of the original diameter pixels 


Accuracy = x 100 (1) 


the sum of the total pixels 
Precision is another measure of model accuracy. Precision means the number of samples that the 
algorithm correctly predicted their class. Equation 2 shows the calculation of this measure 
[15,35]. 


(TP) 
(TP+FP) 


Precision = x 100 (2) 
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The sensitivity criterion measures the number of samples that the algorithm classified in the 
positive category to the total number of positive samples. This criterion is obtained from 
Equation 3 [15,27,35]. 


(TP) 


Recall = 
(TP+FN) 


x 100 (3) 


Kappa is a statistical coefficient that is used to calculate inter-rater reliability for qualitative 
items (Equation 4) [36]. The kappa coefficient considers the probability of a random event [37]. 


k k 
n È jar ni” D jy MMi (4) 


2-yk . 3 
n? Vind Ni+Nyi 


kappa = 


Where n is the total number of observed pixels, k is the number of classes, n;+ is the sum of the 
elements of the ith row, and n+; is the sum of the elements of the ith column. 


Kappa smaller than zero indicates the very poor performance of the model. Kappa between 0.00 
and 0.20 shows poor performance, Kappa between 0.21 to 0.40 shows average downward 
performance, Kappa between 0.41 to 0.60 shows average performance, Kappa between 0.61 and 
0.80 shows good performance, and Kappa between 0.81 to 1.00 shows excellent model 
performance [36]. The results of using the three prediction models along with the important 
parameters are shown in Table 3. 


Table 3 
Accuracy and kappa coefficient of three models. 


Model Main Parameters Accuracy | Precision | Recall | Kappa 


Max Number of Splits: 300 


0 0 0 
Split Criterion: Gini’s Diversity aa Beier W Eaa D 


Decision Trees 


K=1 
KNN Distance Metric: City Block 96.14% 96.42% 96.91% | 0.95 
Distance Weight: Inverse 
pea Discrimtype: Quadratic 41.57% | 40.4% | 42.52% | 0.27 
Analysis 


Kernel type: Gaussian 


0 0 0 
Support Unboutided 52.14% 52.53% 52.69% | 0.38 


Naive Bayes 


Number of Hidden Nodes = 30,45,55 
Epochs = 1000 

Function: scaled conjugate gradient back- 
propagation 


Neural Network 57.76% 58.15% 58.89% | 0.45 


As seen in Table 3, the Discriminant Analysis method has the least accuracy among the models. 
Decision Trees, Naive Bayes, and Neural Network methods are also less accurate. The KNN 
method performed better than the other methods with high accuracy. Therefore, the KNN method 
is used to predict traffic. 


The traffic map shows a better view of traffic. Traffic maps have some advantages; for example, 
they are simple and understandable for everyone. Traffic maps can help the public to know the 
future traffic as well as officials and managers to manage traffic [38]. To show future traffic on 
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the map, it is necessary to give the input 6 variables (coordinate X, coordinate Y, days of the 
week, time, holiday status, and weather condition). Therefore, about 80,000 new points are first 
created in all the streets of the city using GIS. These points are created at regular intervals from 
each other. Each of these points has a coordinate X and Y. The rest of the data is determined by 
the type of prediction problem. The result of model processing indicates the prediction of future 
traffic situations at any point. Using the selected algorithm, the traffic class for each point is 
specified as a number between | and 5. The numbers | to 5 are then displayed in green, yellow, 
orange, red, and dark red respectively. Figure 3 shows the predicted traffic map for the city. 
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Fig. 3. The sites traffic map of the city. 
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Figure 3 shows the amount of traffic anywhere. Using a traffic map allows us to see the future 
traffic on all the streets of the city at the selected time. As seen, the traffic condition of all roads 
is clear. 


3. Conclusions 


Urban traffic is one of the most important and complex issues that exist in most metropolises and 
countries. The traffic pattern may change due to issues such as weekends, school reopening, and 
the morning and evening rush hours. So far, various solutions and suggestions have been 
presented, each of which has its advantages and disadvantages. 


In this article, we tried to use the traffic prediction solution for traffic control and management. 
Therefore, traffic data was collected and pre-processed. Five machine-learning techniques were 
then used for modeling the traffic. Finally, the output of these models is compared using different 
criteria. It was found that Decision Trees, Discriminant Analysis, Naive Bayes, and Neural 
Network techniques have poor performance. However, the KNN model has the best result among 
the models with an accuracy of 96.14% and a kappa coefficient of 0.95. 


The most important limitation of the current research was the collection and control of variables 
affecting urban traffic. Due to the exclusivity of traffic data, it was not possible to collect raw 
traffic data. These data were collected indirectly using web techniques. The large volume of 
input data made the problem lean toward big data. Therefore, this issue requires special big data 
analysis. There were also different classification models and algorithms, but it was not possible 
to use some of them due to the special structure of some input data. 


Next, a traffic map was used to show the future traffic situation of the city for a better view. 
Traffic maps can help the public to know the future traffic as well as officials and managers to 
manage traffic. One of the most important advantages of the current model is the 
comprehensiveness of the model for all city streets. 


The proposed method is cheap and reliable. This method can predict traffic offline, which is its 
most important advantage. Since neighborhood and location are the main factors affecting traffic 
congestion, these two variables were considered in the modeling based on GIS concept. The high 
accuracy of the model and the preparation of a spatial map are other advantages of the current 
model. Future research can be used to increase prediction accuracy with long-term data or data 
from one or more seasons of the year. 
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